AITopics | model mismatch

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

van Zutphen, Menno, Herceg, Domagoj, Delimpaltadakis, Giannis, Antunes, Duarte J.

Tempering the Bayes Filter towards Improved Model-Based Estimation

arXiv.org Machine LearningDec-3-2025

Model-based filtering is often carried out while subject to an imperfect model, as learning partially-observable stochastic systems remains a challenge. Recent work on Bayesian inference found that tempering the likelihood or full posterior of an imperfect model can improve predictive accuracy, as measured by expected negative log likelihood. In this paper, we develop the tempered Bayes filter, improving estimation performance through both of the aforementioned, and one newly introduced, modalities. The result admits a recursive implementation with a computational complexity no higher than that of the original Bayes filter. Our analysis reveals that -- besides the well-known fact in the field of Bayesian inference that likelihood tempering affects the balance between prior and likelihood -- full-posterior tempering tunes the level of entropy in the final belief distribution. We further find that a region of the tempering space can be understood as interpolating between the Bayes- and MAP filters, recovering these as special cases. Analytical results further establish conditions under which a tempered Bayes filter achieves improved predictive performance. Specializing the results to the linear Gaussian case, we obtain the tempered Kalman filter. In this context, we interpret how the parameters affect the Kalman state estimate and covariance propagation. Empirical results confirm that our method consistently improves predictive accuracy over the Bayes filter baseline.

bayes filter, likelihood, posterior, (15 more...)

arXiv.org Machine Learning

2512.02823

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Ohio (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Neural Information Processing SystemsNov-21-2025, 15:46:43 GMT

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of Q-learning, Sarsa, and TD-learning and prove convergence to an approximately optimal robust policy and approximate value function respectively. We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings. We prove convergence of robust approximate policy iteration and robust approximate value iteration for linear architectures (under mild assumptions). We also define a robust loss function, the mean squared robust projected Bellman error and give stochastic gradient descent algorithms that are guaranteed to converge to a local minimum.

model mismatch, name change, reinforcement learning, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

arXiv.org Artificial IntelligenceNov-4-2025

Real-DRL: Teach and Learn in Reality

Mao, Yanbing, Cai, Yihao, Sha, Lui

This paper introduces the Real-DRL framework for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants (i.e., real physical systems to be controlled), while prioritizing safety! The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the real-time safety-informed batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i) fostering the teaching-to-learn paradigm for DRL-Student and ii) backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i) assured safety, ii) automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii) safety-informed batch sampling to address the learning experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in NVIDIA Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.

artificial intelligence, equation, machine learning, (16 more...)

2511.00112

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.87)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)

Neural Information Processing SystemsOct-9-2025, 18:50:36 GMT

A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch

To tackle these issues, we propose a unified principle of pessimism using distribu-tionally robust Markov decision processes.

arxiv preprint arxiv, log 4sa, probability, (13 more...)

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > New York > Erie County > Buffalo (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 14:24:46 GMT

our response to Reviewer 2. We will also include the suggested references for Bayesian UQ methods

We greatly thank the reviewers for their constructive comments. However, in certain applications, there is scientific evidence for a parametric form of the triggering kernel (e.g., [Beggs We show here a small example with 5 nodes in Figure 1. We have similar observations for recovering other edges in this example. In each picture, the two blue curves outline the proposed CIs, and the two red curves outline the asymptotic CI. Moreover, we will adjust Section 2-3 as suggested.

artificial intelligence, machine learning, reviewer 2, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Jaszczuk, Mateusz, Figueroa, Nadia

Rapid Mismatch Estimation via Neural Network Informed Variational Inference

arXiv.org Artificial IntelligenceAug-29-2025

With robots increasingly operating in human-centric environments, ensuring soft and safe physical interactions, whether with humans, surroundings, or other machines, is essential. While compliant hardware can facilitate such interactions, this work focuses on impedance controllers that allow torque-controlled robots to safely and passively respond to contact while accurately executing tasks. From inverse dynamics to quadratic programming-based controllers, the effectiveness of these methods relies on accurate dynamics models of the robot and the object it manipulates. Any model mismatch results in task failures and unsafe behaviors. Thus, we introduce Rapid Mismatch Estimation (RME), an adaptive, controller-agnostic, probabilistic framework that estimates end-effector dynamics mismatches online, without relying on external force-torque sensors. From the robot's proprioceptive feedback, a Neural Network Model Mismatch Estimator generates a prior for a Variational Inference solver, which rapidly converges to the unknown parameters while quantifying uncertainty. With a real 7-DoF manipulator driven by a state-of-the-art passive impedance controller, RME adapts to sudden changes in mass and center of mass at the end-effector in $\sim400$ ms, in static and dynamic settings. We demonstrate RME in a collaborative scenario where a human attaches an unknown basket to the robot's end-effector and dynamically adds/removes heavy items, showcasing fast and safe adaptation to changing dynamics during physical interaction without any external sensory system.

artificial intelligence, machine learning, mismatch, (18 more...)

2508.21007

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Giacomuzzo, Giulio, Abdelwahab, Mohamed, Calì, Marco, Libera, Alberto Dalla, Carli, Ruggero

A Robust Controller based on Gaussian Processes for Robotic Manipulators with Unknown Uncertainty

arXiv.org Artificial IntelligenceJul-16-2025

In this paper, we propose a novel learning-based robust feedback linearization strategy to ensure precise trajectory tracking for an important family of Lagrangian systems. We assume a nominal knowledge of the dynamics is given but no a-priori bounds on the model mismatch are available. In our approach, the key ingredient is the adoption of a regression framework based on Gaussian Processes (GPR) to estimate the model mismatch. This estimate is added to the outer loop of a classical feedback linearization scheme based on the nominal knowledge available. Then, to compensate for the residual uncertainty, we robustify the controller including an additional term whose size is designed based on the variance provided by the GPR framework. We proved that, with high probability, the proposed scheme is able to guarantee asymptotic tracking of a desired trajectory. We tested numerically our strategy on a 2 degrees of freedom planar robot.

artificial intelligence, machine learning, trajectory, (18 more...)

2507.1117

Country: Europe > Italy (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.41)

Neural Information Processing SystemsMay-26-2025, 16:33:39 GMT

A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch

In this paper, we address the challenges of offline reinforcement learning (RL) under model mismatch, where the agent aims to optimize its performance through an offline dataset that may not accurately represent the deployment environment. We identify two primary challenges under the setting: inaccurate model estimation due to limited data and performance degradation caused by the model mismatch between the dataset-collecting environment and the target deployment one. To tackle these issues, we propose a unified principle of pessimism using distributionally robust Markov decision processes. We carefully construct a robust MDP with a single uncertainty set to tackle both data sparsity and model mismatch, and demonstrate that the optimal robust policy enjoys a near-optimal sub-optimality gap under the target environment across three widely used uncertainty models: total variation, \chi 2 divergence, and KL divergence. Our results improve upon or match the state-of-the-art performance under the total variation and KL divergence models, and provide the first result for the \chi 2 divergence model.

artificial intelligence, machine learning, offline reinforcement learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Beikmohammadi, Ali, Khirirat, Sarit, Richtárik, Peter, Magnússon, Sindri

Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis

arXiv.org Artificial IntelligenceMar-21-2025

Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents. However, many existing FedRL algorithms assume that all agents operate in identical environments, which is often unrealistic. In real-world applications -- such as multi-robot teams, crowdsourced systems, and large-scale sensor networks -- each agent may experience slightly different transition dynamics, leading to inherent model mismatches. In this paper, we first establish linear convergence guarantees for single-agent temporal difference learning (TD(0)) in policy evaluation and demonstrate that under a perturbed environment, the agent suffers a systematic bias that prevents accurate estimation of the true value function. This result holds under both i.i.d. and Markovian sampling regimes. We then extend our analysis to the federated TD(0) (FedTD(0)) setting, where multiple agents -- each interacting with its own perturbed environment -- periodically share value estimates to collaboratively approximate the true value function of a common underlying model. Our theoretical results indicate the impact of model mismatch, network connectivity, and mixing behavior on the convergence of FedTD(0). Empirical experiments corroborate our theoretical gains, highlighting that even moderate levels of information sharing can significantly mitigate environment-specific errors.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

2503.17454

Genre: Research Report (0.69)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.87)